Suzuki, Daiji - Mathematics of Deep Learning

https://gyazo.com/9eb4355e2d576490e8ed810b29f2d732

hillbig Theoretical analysis of deep learning by Dr. Daiji Suzuki, especially on representation capability, generalization capability, and optimization theory. He covers a wide range of important topics, including the latest Neural Tangent Kernel and dual effect. I don't think there is anything as comprehensive as this in English.

Daisuke Okanohara

I get an error when I access the original Slideshare, I can see it on X/Twitter, cache?

https://gyazo.com/67a306cd676616557e1b7f2aef5f4a46https://gyazo.com/91bb95807b70a093ab6e502e77183bfb

Kolmogorov's addition theorem

universal approximation

Ridgelet conversion

Number of representations and layers

https://gyazo.com/81c47eba5249bf7a67c78a1f9c059646

As an easy-to-understand concrete example, in the case of a function whose value is determined by the distance from the origin, four layers would be of polynomial order with respect to the number of dimensions (I think it's linear, frankly).

Kernel Method and Ridge Regression

regenerative nuclear hillbelt space

I'm redescribing the kernel ridge regression in terms of the idea of a regenerative nuclear Hilbert space, but I'll skip that part.

Deep learning can be interpreted as learning the kernel function itself in accordance with the data.

https://gyazo.com/559eee2c77c2faabcbfb3bb4388510ed

...

https://gyazo.com/80bc5f45abdf998da8cc85537a1eccd5

double-drop

implicit regularization

Generalization error bound

skip this spot

Approximation performance by function class

piecewise smooth function

https://gyazo.com/c00662aa291b6c2f43a9bb31b8971516

mixed-smoothness

https://gyazo.com/ab6fcaef4375e2fd5cb35ac9b67ce914

https://gyazo.com/12f8bab940df517a48c17bc1790f6794

https://gyazo.com/71e2a50728f2efb8eb73288bcf086f6e

kernel ridge regression

adaptive method

deep learning

sparse estimation

I guess if you have too many things to prepare in advance, it becomes impractical.

Bezov space

https://gyazo.com/0e64538bcf2dbcaa22bc9756d52f21cc

https://gyazo.com/b109a40895945675dc80d9a21dd3262d

https://gyazo.com/b8de7edc8b5ffb8bdc6e4645cf40d904

The various function classes mentioned in past discussions are special cases of [Bezov space

https://gyazo.com/1e444edc999820b1513eed7128bc05a0

https://gyazo.com/b6700e9f6e38b23a4a2d998ff3096101

https://gyazo.com/417c9adfc08d6f1968519f715a76e1d9

→Sparsity.

https://gyazo.com/73f00defbc6ede271105ea5fa0fe2372

Deep NN can approximate the source in Besov space

Cardinal B-spline can be well approximated by ReLU-NN

https://gyazo.com/ff1bdd3dac6e15926b05534bd189f812

https://gyazo.com/e81cdac1d099c519afd93056d04d7ca8

Deep learning is superior when spatial smoothness is non-uniform

Mixed-smooth Besov space

Non-probabilistic gradient method takes exponential time to get out of the saddle point.

https://gyazo.com/d7195cb5e0d60b3058f8715d9a77ec60

Neural Tangent Kernel

Mean Field

https://gyazo.com/38423f31f4c743520a0977cece288c06

Watterstein Distance

---

This page is auto-translated from /nishio/鈴木大慈-深層学習の数理 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.